Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 8523 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 799.2 KiB |
| Average record size in memory | 96.0 B |
Variable types
| Numeric | 8 |
|---|---|
| Categorical | 4 |
Item_MRP is highly correlated with Item_Outlet_Sales | High correlation |
Outlet_Identifier is highly correlated with Outlet_Size and 1 other fields | High correlation |
Outlet_Size is highly correlated with Outlet_Identifier and 1 other fields | High correlation |
Outlet_Location_Type is highly correlated with Outlet_Identifier and 1 other fields | High correlation |
Item_Outlet_Sales is highly correlated with Item_MRP | High correlation |
Item_MRP is highly correlated with Item_Outlet_Sales | High correlation |
Outlet_Identifier is highly correlated with Outlet_Location_Type | High correlation |
Outlet_Size is highly correlated with Outlet_Location_Type | High correlation |
Outlet_Location_Type is highly correlated with Outlet_Identifier and 1 other fields | High correlation |
Item_Outlet_Sales is highly correlated with Item_MRP | High correlation |
Outlet_Identifier is highly correlated with Outlet_Location_Type | High correlation |
Outlet_Size is highly correlated with Outlet_Location_Type | High correlation |
Outlet_Location_Type is highly correlated with Outlet_Identifier and 1 other fields | High correlation |
Item_Weight is highly correlated with Outlet_Identifier and 1 other fields | High correlation |
Outlet_Location_Type is highly correlated with Outlet_Identifier and 3 other fields | High correlation |
Item_MRP is highly correlated with Item_Outlet_Sales | High correlation |
Outlet_Identifier is highly correlated with Item_Weight and 5 other fields | High correlation |
Item_Identifier is highly correlated with Item_Type | High correlation |
Outlet_Type is highly correlated with Item_Weight and 3 other fields | High correlation |
Item_Type is highly correlated with Item_Identifier | High correlation |
Outlet_Size is highly correlated with Outlet_Location_Type and 2 other fields | High correlation |
Outlet_Establishment_Year is highly correlated with Outlet_Location_Type and 3 other fields | High correlation |
Item_Outlet_Sales is highly correlated with Item_MRP and 1 other fields | High correlation |
Outlet_Type is highly correlated with Outlet_Location_Type | High correlation |
Outlet_Location_Type is highly correlated with Outlet_Type | High correlation |
Item_Visibility has 526 (6.2%) zeros | Zeros |
Item_Type has 648 (7.6%) zeros | Zeros |
Outlet_Identifier has 555 (6.5%) zeros | Zeros |
Reproduction
| Analysis started | 2021-07-31 05:37:43.869381 |
|---|---|
| Analysis finished | 2021-07-31 05:38:23.882976 |
| Duration | 40.01 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 1559 |
|---|---|
| Distinct (%) | 18.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 779.7148891 |
| Minimum | 0 |
|---|---|
| Maximum | 1558 |
| Zeros | 6 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 77.1 |
| Q1 | 395.5 |
| median | 783 |
| Q3 | 1167 |
| 95-th percentile | 1477 |
| Maximum | 1558 |
| Range | 1558 |
| Interquartile range (IQR) | 771.5 |
Descriptive statistics
| Standard deviation | 449.2223766 |
|---|---|
| Coefficient of variation (CV) | 0.5761367172 |
| Kurtosis | -1.195555354 |
| Mean | 779.7148891 |
| Median Absolute Deviation (MAD) | 386 |
| Skewness | -0.008877177849 |
| Sum | 6645510 |
| Variance | 201800.7436 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 413 | 10 | 0.1% |
| 1077 | 10 | 0.1% |
| 702 | 9 | 0.1% |
| 390 | 9 | 0.1% |
| 1454 | 9 | 0.1% |
| 1542 | 9 | 0.1% |
| 750 | 9 | 0.1% |
| 1276 | 9 | 0.1% |
| 35 | 9 | 0.1% |
| 301 | 9 | 0.1% |
| Other values (1549) | 8431 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 1 | 7 | |
| 2 | 8 | |
| 3 | 3 | < 0.1% |
| 4 | 5 | |
| 5 | 4 | |
| 6 | 6 | |
| 7 | 7 | |
| 8 | 6 | |
| 9 | 4 |
| Value | Count | Frequency (%) |
| 1558 | 7 | |
| 1557 | 5 | |
| 1556 | 5 | |
| 1555 | 5 | |
| 1554 | 7 | |
| 1553 | 4 | |
| 1552 | 7 | |
| 1551 | 6 | |
| 1550 | 7 | |
| 1549 | 3 |
| Distinct | 416 |
|---|---|
| Distinct (%) | 4.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.85764518 |
| Minimum | 4.555 |
|---|---|
| Maximum | 21.35 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.7 KiB |
Quantile statistics
| Minimum | 4.555 |
|---|---|
| 5-th percentile | 6.13 |
| Q1 | 9.31 |
| median | 12.85764518 |
| Q3 | 16 |
| 95-th percentile | 20.19 |
| Maximum | 21.35 |
| Range | 16.795 |
| Interquartile range (IQR) | 6.69 |
Descriptive statistics
| Standard deviation | 4.226123725 |
|---|---|
| Coefficient of variation (CV) | 0.3286856702 |
| Kurtosis | -0.8602944788 |
| Mean | 12.85764518 |
| Median Absolute Deviation (MAD) | 3.342354816 |
| Skewness | 0.09056145192 |
| Sum | 109585.7099 |
| Variance | 17.86012174 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 12.85764518 | 1463 | 17.2% |
| 12.15 | 86 | 1.0% |
| 17.6 | 82 | 1.0% |
| 13.65 | 77 | 0.9% |
| 11.8 | 76 | 0.9% |
| 15.1 | 68 | 0.8% |
| 9.3 | 68 | 0.8% |
| 16.7 | 66 | 0.8% |
| 10.5 | 66 | 0.8% |
| 19.35 | 63 | 0.7% |
| Other values (406) | 6408 |
| Value | Count | Frequency (%) |
| 4.555 | 4 | |
| 4.59 | 5 | |
| 4.61 | 7 | |
| 4.615 | 4 | |
| 4.635 | 5 | |
| 4.785 | 5 | |
| 4.805 | 4 | |
| 4.88 | 5 | |
| 4.905 | 2 | < 0.1% |
| 4.92 | 5 |
| Value | Count | Frequency (%) |
| 21.35 | 7 | 0.1% |
| 21.25 | 24 | 0.3% |
| 21.2 | 5 | 0.1% |
| 21.1 | 17 | 0.2% |
| 21 | 6 | 0.1% |
| 20.85 | 35 | |
| 20.75 | 39 | |
| 20.7 | 62 | |
| 20.6 | 38 | |
| 20.5 | 44 |
Item_Fat_Content
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 66.7 KiB |
| 1 | |
|---|---|
| 2 | |
| 0 | 316 |
| 4 | 117 |
| 3 | 112 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8523 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 2 |
| 3rd row | 1 |
| 4th row | 2 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 5089 | |
| 2 | 2889 | |
| 0 | 316 | 3.7% |
| 4 | 117 | 1.4% |
| 3 | 112 | 1.3% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 1 | 5089 | |
| 2 | 2889 | |
| 0 | 316 | 3.7% |
| 4 | 117 | 1.4% |
| 3 | 112 | 1.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 5089 | |
| 2 | 2889 | |
| 0 | 316 | 3.7% |
| 4 | 117 | 1.4% |
| 3 | 112 | 1.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8523 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 5089 | |
| 2 | 2889 | |
| 0 | 316 | 3.7% |
| 4 | 117 | 1.4% |
| 3 | 112 | 1.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 8523 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 5089 | |
| 2 | 2889 | |
| 0 | 316 | 3.7% |
| 4 | 117 | 1.4% |
| 3 | 112 | 1.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8523 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 5089 | |
| 2 | 2889 | |
| 0 | 316 | 3.7% |
| 4 | 117 | 1.4% |
| 3 | 112 | 1.3% |
| Distinct | 7880 |
|---|---|
| Distinct (%) | 92.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.06613202878 |
| Minimum | 0 |
|---|---|
| Maximum | 0.328390948 |
| Zeros | 526 |
| Zeros (%) | 6.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.0269894775 |
| median | 0.053930934 |
| Q3 | 0.0945852925 |
| 95-th percentile | 0.1637797636 |
| Maximum | 0.328390948 |
| Range | 0.328390948 |
| Interquartile range (IQR) | 0.067595815 |
Descriptive statistics
| Standard deviation | 0.05159782232 |
|---|---|
| Coefficient of variation (CV) | 0.7802243977 |
| Kurtosis | 1.679445483 |
| Mean | 0.06613202878 |
| Median Absolute Deviation (MAD) | 0.030972154 |
| Skewness | 1.16709055 |
| Sum | 563.6432813 |
| Variance | 0.002662335268 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 526 | 6.2% |
| 0.076975118 | 3 | < 0.1% |
| 0.041283361 | 2 | < 0.1% |
| 0.085622362 | 2 | < 0.1% |
| 0.187841082 | 2 | < 0.1% |
| 0.134975628 | 2 | < 0.1% |
| 0.107223632 | 2 | < 0.1% |
| 0.085274988 | 2 | < 0.1% |
| 0.076855628 | 2 | < 0.1% |
| 0.059835659 | 2 | < 0.1% |
| Other values (7870) | 7978 |
| Value | Count | Frequency (%) |
| 0 | 526 | |
| 0.003574698 | 1 | < 0.1% |
| 0.003589104 | 1 | < 0.1% |
| 0.003597678 | 1 | < 0.1% |
| 0.003599378 | 1 | < 0.1% |
| 0.003606726 | 1 | < 0.1% |
| 0.003612411 | 1 | < 0.1% |
| 0.005209791 | 1 | < 0.1% |
| 0.005230786 | 1 | < 0.1% |
| 0.005234153 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0.328390948 | 1 | |
| 0.325780807 | 1 | |
| 0.32111501 | 1 | |
| 0.311090379 | 1 | |
| 0.309390255 | 1 | |
| 0.308145448 | 1 | |
| 0.306542848 | 1 | |
| 0.305305397 | 1 | |
| 0.304859104 | 1 | |
| 0.304737387 | 1 |
| Distinct | 16 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.226680746 |
| Minimum | 0 |
|---|---|
| Maximum | 15 |
| Zeros | 648 |
| Zeros (%) | 7.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 4 |
| median | 6 |
| Q3 | 10 |
| 95-th percentile | 14 |
| Maximum | 15 |
| Range | 15 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 4.209989863 |
|---|---|
| Coefficient of variation (CV) | 0.5825620379 |
| Kurtosis | -0.9662196602 |
| Mean | 7.226680746 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.1016546244 |
| Sum | 61593 |
| Variance | 17.72401464 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=16)
| Value | Count | Frequency (%) |
| 6 | 1232 | |
| 13 | 1200 | |
| 9 | 910 | |
| 5 | 856 | |
| 4 | 682 | |
| 3 | 649 | |
| 0 | 648 | |
| 8 | 520 | |
| 14 | 445 | 5.2% |
| 10 | 425 | 5.0% |
| Other values (6) | 956 |
| Value | Count | Frequency (%) |
| 0 | 648 | |
| 1 | 251 | 2.9% |
| 2 | 110 | 1.3% |
| 3 | 649 | |
| 4 | 682 | |
| 5 | 856 | |
| 6 | 1232 | |
| 7 | 214 | 2.5% |
| 8 | 520 | |
| 9 | 910 |
| Value | Count | Frequency (%) |
| 15 | 148 | 1.7% |
| 14 | 445 | 5.2% |
| 13 | 1200 | |
| 12 | 64 | 0.8% |
| 11 | 169 | 2.0% |
| 10 | 425 | 5.0% |
| 9 | 910 | |
| 8 | 520 | |
| 7 | 214 | 2.5% |
| 6 | 1232 |
| Distinct | 5938 |
|---|---|
| Distinct (%) | 69.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 140.992782 |
| Minimum | 31.29 |
|---|---|
| Maximum | 266.8884 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.7 KiB |
Quantile statistics
| Minimum | 31.29 |
|---|---|
| 5-th percentile | 42.5167 |
| Q1 | 93.8265 |
| median | 143.0128 |
| Q3 | 185.6437 |
| 95-th percentile | 250.76924 |
| Maximum | 266.8884 |
| Range | 235.5984 |
| Interquartile range (IQR) | 91.8172 |
Descriptive statistics
| Standard deviation | 62.27506651 |
|---|---|
| Coefficient of variation (CV) | 0.4416897492 |
| Kurtosis | -0.8897690937 |
| Mean | 140.992782 |
| Median Absolute Deviation (MAD) | 46.0376 |
| Skewness | 0.1272022683 |
| Sum | 1201681.481 |
| Variance | 3878.183909 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 172.0422 | 7 | 0.1% |
| 188.1872 | 6 | 0.1% |
| 170.5422 | 6 | 0.1% |
| 109.5228 | 6 | 0.1% |
| 196.5084 | 6 | 0.1% |
| 142.0154 | 6 | 0.1% |
| 196.5768 | 6 | 0.1% |
| 192.2478 | 5 | 0.1% |
| 143.2154 | 5 | 0.1% |
| 108.6912 | 5 | 0.1% |
| Other values (5928) | 8465 |
| Value | Count | Frequency (%) |
| 31.29 | 1 | |
| 31.49 | 1 | |
| 31.89 | 1 | |
| 31.9558 | 2 | |
| 32.0558 | 1 | |
| 32.09 | 1 | |
| 32.3558 | 1 | |
| 32.4558 | 1 | |
| 32.49 | 1 | |
| 32.6558 | 2 |
| Value | Count | Frequency (%) |
| 266.8884 | 2 | |
| 266.6884 | 2 | |
| 266.5884 | 2 | |
| 266.2884 | 1 | |
| 266.1884 | 2 | |
| 266.0226 | 1 | |
| 265.8884 | 1 | |
| 265.7884 | 1 | |
| 265.6884 | 1 | |
| 265.5568 | 1 |
Outlet_Identifier
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 10 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.722280887 |
| Minimum | 0 |
|---|---|
| Maximum | 9 |
| Zeros | 555 |
| Zeros (%) | 6.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 5 |
| Q3 | 7 |
| 95-th percentile | 9 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 2.837201297 |
|---|---|
| Coefficient of variation (CV) | 0.6008116343 |
| Kurtosis | -1.260779927 |
| Mean | 4.722280887 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.05986138181 |
| Sum | 40248 |
| Variance | 8.049711202 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) |
| 5 | 935 | |
| 1 | 932 | |
| 6 | 930 | |
| 9 | 930 | |
| 8 | 930 | |
| 7 | 929 | |
| 3 | 928 | |
| 2 | 926 | |
| 0 | 555 | |
| 4 | 528 |
| Value | Count | Frequency (%) |
| 0 | 555 | |
| 1 | 932 | |
| 2 | 926 | |
| 3 | 928 | |
| 4 | 528 | |
| 5 | 935 | |
| 6 | 930 | |
| 7 | 929 | |
| 8 | 930 | |
| 9 | 930 |
| Value | Count | Frequency (%) |
| 9 | 930 | |
| 8 | 930 | |
| 7 | 929 | |
| 6 | 930 | |
| 5 | 935 | |
| 4 | 528 | |
| 3 | 928 | |
| 2 | 926 | |
| 1 | 932 | |
| 0 | 555 |
| Distinct | 9 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1997.831867 |
| Minimum | 1985 |
|---|---|
| Maximum | 2009 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.7 KiB |
Quantile statistics
| Minimum | 1985 |
|---|---|
| 5-th percentile | 1985 |
| Q1 | 1987 |
| median | 1999 |
| Q3 | 2004 |
| 95-th percentile | 2009 |
| Maximum | 2009 |
| Range | 24 |
| Interquartile range (IQR) | 17 |
Descriptive statistics
| Standard deviation | 8.371760408 |
|---|---|
| Coefficient of variation (CV) | 0.004190422902 |
| Kurtosis | -1.205693917 |
| Mean | 1997.831867 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | -0.3966407859 |
| Sum | 17027521 |
| Variance | 70.08637233 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=9)
| Value | Count | Frequency (%) |
| 1985 | 1463 | |
| 1987 | 932 | |
| 1999 | 930 | |
| 1997 | 930 | |
| 2004 | 930 | |
| 2002 | 929 | |
| 2009 | 928 | |
| 2007 | 926 | |
| 1998 | 555 | 6.5% |
| Value | Count | Frequency (%) |
| 1985 | 1463 | |
| 1987 | 932 | |
| 1997 | 930 | |
| 1998 | 555 | 6.5% |
| 1999 | 930 | |
| 2002 | 929 | |
| 2004 | 930 | |
| 2007 | 926 | |
| 2009 | 928 |
| Value | Count | Frequency (%) |
| 2009 | 928 | |
| 2007 | 926 | |
| 2004 | 930 | |
| 2002 | 929 | |
| 1999 | 930 | |
| 1998 | 555 | 6.5% |
| 1997 | 930 | |
| 1987 | 932 | |
| 1985 | 1463 |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 66.7 KiB |
| 1 | |
|---|---|
| 2 | |
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8523 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 5203 | |
| 2 | 2388 | |
| 0 | 932 | 10.9% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 1 | 5203 | |
| 2 | 2388 | |
| 0 | 932 | 10.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 5203 | |
| 2 | 2388 | |
| 0 | 932 | 10.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8523 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 5203 | |
| 2 | 2388 | |
| 0 | 932 | 10.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 8523 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 5203 | |
| 2 | 2388 | |
| 0 | 932 | 10.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8523 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 5203 | |
| 2 | 2388 | |
| 0 | 932 | 10.9% |
Outlet_Location_Type
Categorical
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 66.7 KiB |
| 2 | |
|---|---|
| 1 | |
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8523 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 2 |
| 3rd row | 0 |
| 4th row | 2 |
| 5th row | 2 |
Common Values
| Value | Count | Frequency (%) |
| 2 | 3350 | |
| 1 | 2785 | |
| 0 | 2388 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 2 | 3350 | |
| 1 | 2785 | |
| 0 | 2388 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 3350 | |
| 1 | 2785 | |
| 0 | 2388 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8523 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 3350 | |
| 1 | 2785 | |
| 0 | 2388 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 8523 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 3350 | |
| 1 | 2785 | |
| 0 | 2388 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8523 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 3350 | |
| 1 | 2785 | |
| 0 | 2388 |
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 66.7 KiB |
| 1 | |
|---|---|
| 0 | |
| 3 | |
| 2 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 8523 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 2 |
| 3rd row | 1 |
| 4th row | 0 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 5577 | |
| 0 | 1083 | 12.7% |
| 3 | 935 | 11.0% |
| 2 | 928 | 10.9% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 1 | 5577 | |
| 0 | 1083 | 12.7% |
| 3 | 935 | 11.0% |
| 2 | 928 | 10.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 5577 | |
| 0 | 1083 | 12.7% |
| 3 | 935 | 11.0% |
| 2 | 928 | 10.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 8523 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 5577 | |
| 0 | 1083 | 12.7% |
| 3 | 935 | 11.0% |
| 2 | 928 | 10.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 8523 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 5577 | |
| 0 | 1083 | 12.7% |
| 3 | 935 | 11.0% |
| 2 | 928 | 10.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8523 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 5577 | |
| 0 | 1083 | 12.7% |
| 3 | 935 | 11.0% |
| 2 | 928 | 10.9% |
| Distinct | 3493 |
|---|---|
| Distinct (%) | 41.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2181.288914 |
| Minimum | 33.29 |
|---|---|
| Maximum | 13086.9648 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.7 KiB |
Quantile statistics
| Minimum | 33.29 |
|---|---|
| 5-th percentile | 188.4214 |
| Q1 | 834.2474 |
| median | 1794.331 |
| Q3 | 3101.2964 |
| 95-th percentile | 5522.811 |
| Maximum | 13086.9648 |
| Range | 13053.6748 |
| Interquartile range (IQR) | 2267.049 |
Descriptive statistics
| Standard deviation | 1706.499616 |
|---|---|
| Coefficient of variation (CV) | 0.7823354371 |
| Kurtosis | 1.615876681 |
| Mean | 2181.288914 |
| Median Absolute Deviation (MAD) | 1081.925 |
| Skewness | 1.177530603 |
| Sum | 18591125.41 |
| Variance | 2912140.938 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 958.752 | 17 | 0.2% |
| 1342.2528 | 16 | 0.2% |
| 1845.5976 | 15 | 0.2% |
| 703.0848 | 15 | 0.2% |
| 1278.336 | 14 | 0.2% |
| 1230.3984 | 14 | 0.2% |
| 1416.8224 | 13 | 0.2% |
| 1438.128 | 12 | 0.1% |
| 759.012 | 12 | 0.1% |
| 575.2512 | 12 | 0.1% |
| Other values (3483) | 8383 |
| Value | Count | Frequency (%) |
| 33.29 | 2 | < 0.1% |
| 33.9558 | 1 | < 0.1% |
| 34.6216 | 1 | < 0.1% |
| 35.2874 | 1 | < 0.1% |
| 36.619 | 2 | < 0.1% |
| 37.2848 | 1 | < 0.1% |
| 37.9506 | 5 | |
| 38.6164 | 2 | < 0.1% |
| 39.948 | 2 | < 0.1% |
| 40.6138 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 13086.9648 | 1 | |
| 12117.56 | 1 | |
| 11445.102 | 1 | |
| 10993.6896 | 1 | |
| 10306.584 | 1 | |
| 10256.649 | 1 | |
| 10236.675 | 1 | |
| 10072.8882 | 1 | |
| 9779.9362 | 1 | |
| 9678.0688 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Item_Identifier | Item_Weight | Item_Fat_Content | Item_Visibility | Item_Type | Item_MRP | Outlet_Identifier | Outlet_Establishment_Year | Outlet_Size | Outlet_Location_Type | Outlet_Type | Item_Outlet_Sales | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 156 | 9.300000 | 1 | 0.016047 | 4 | 249.8092 | 9 | 1999 | 1 | 0 | 1 | 3735.1380 |
| 1 | 8 | 5.920000 | 2 | 0.019278 | 14 | 48.2692 | 3 | 2009 | 1 | 2 | 2 | 443.4228 |
| 2 | 662 | 17.500000 | 1 | 0.016760 | 10 | 141.6180 | 9 | 1999 | 1 | 0 | 1 | 2097.2700 |
| 3 | 1121 | 19.200000 | 2 | 0.000000 | 6 | 182.0950 | 0 | 1998 | 1 | 2 | 0 | 732.3800 |
| 4 | 1297 | 8.930000 | 1 | 0.000000 | 9 | 53.8614 | 1 | 1987 | 0 | 2 | 1 | 994.7052 |
| 5 | 758 | 10.395000 | 2 | 0.000000 | 0 | 51.4008 | 3 | 2009 | 1 | 2 | 2 | 556.6088 |
| 6 | 696 | 13.650000 | 2 | 0.012741 | 13 | 57.6588 | 1 | 1987 | 0 | 2 | 1 | 343.5528 |
| 7 | 738 | 12.857645 | 1 | 0.127470 | 13 | 107.7622 | 5 | 1985 | 1 | 2 | 3 | 4022.7636 |
| 8 | 440 | 16.200000 | 2 | 0.016687 | 5 | 96.9726 | 7 | 2002 | 1 | 1 | 1 | 1076.5986 |
| 9 | 990 | 19.200000 | 2 | 0.094450 | 5 | 187.8214 | 2 | 2007 | 1 | 1 | 1 | 4710.5350 |
Last rows
| Item_Identifier | Item_Weight | Item_Fat_Content | Item_Visibility | Item_Type | Item_MRP | Outlet_Identifier | Outlet_Establishment_Year | Outlet_Size | Outlet_Location_Type | Outlet_Type | Item_Outlet_Sales | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8513 | 449 | 12.000 | 2 | 0.020407 | 10 | 99.9042 | 6 | 2004 | 2 | 1 | 1 | 595.2252 |
| 8514 | 145 | 15.000 | 2 | 0.054489 | 3 | 57.5904 | 7 | 2002 | 1 | 1 | 1 | 468.7232 |
| 8515 | 445 | 20.700 | 1 | 0.021518 | 0 | 157.5288 | 3 | 2009 | 1 | 2 | 2 | 1571.2880 |
| 8516 | 1356 | 18.600 | 1 | 0.118661 | 11 | 58.7588 | 3 | 2009 | 1 | 2 | 2 | 858.8820 |
| 8517 | 389 | 20.750 | 4 | 0.083607 | 5 | 178.8318 | 8 | 1997 | 2 | 0 | 1 | 3608.6360 |
| 8518 | 370 | 6.865 | 1 | 0.056783 | 13 | 214.5218 | 1 | 1987 | 0 | 2 | 1 | 2778.3834 |
| 8519 | 897 | 8.380 | 2 | 0.046982 | 0 | 108.1570 | 7 | 2002 | 1 | 1 | 1 | 549.2850 |
| 8520 | 1357 | 10.600 | 1 | 0.035186 | 8 | 85.1224 | 6 | 2004 | 2 | 1 | 1 | 1193.1136 |
| 8521 | 681 | 7.210 | 2 | 0.145221 | 13 | 103.1332 | 3 | 2009 | 1 | 2 | 2 | 1845.5976 |
| 8522 | 50 | 14.800 | 1 | 0.044878 | 14 | 75.4670 | 8 | 1997 | 2 | 0 | 1 | 765.6700 |